Expand description
RustFFT is a high-performance FFT library written in pure Rust.
On X86_64, RustFFT supports the AVX instruction set for increased performance. No special code is needed to activate AVX:
Simply plan a FFT using the FftPlanner on a machine that supports the avx
and fma
CPU features, and RustFFT
will automatically switch to faster AVX-accelerated algorithms.
For machines that do not have AVX, RustFFT also supports the SSE4.1 instruction set. As for AVX, this is enabled automatically when using the FftPlanner.
Additionally, there is automatic support for the Neon instruction set on AArch64, and support for WASM SIMD when compiling for WASM targets.
§Usage
The recommended way to use RustFFT is to create a FftPlanner
instance and then call its
plan_fft
method. This method will automatically choose which FFT algorithms are best
for a given size and initialize the required buffers and precomputed data.
// Perform a forward FFT of size 1234
use rustfft::{FftPlanner, num_complex::Complex};
let mut planner = FftPlanner::new();
let fft = planner.plan_fft_forward(1234);
let mut buffer = vec![Complex{ re: 0.0f32, im: 0.0f32 }; 1234];
fft.process(&mut buffer);
The planner returns trait objects of the Fft
trait, allowing for FFT sizes that aren’t known
until runtime.
RustFFT also exposes individual FFT algorithms. For example, if you know beforehand that you need a power-of-two FFT, you can
avoid the overhead of the planner and trait object by directly creating instances of the Radix4
algorithm:
// Computes a forward FFT of size 4096
use rustfft::{Fft, FftDirection, num_complex::Complex, algorithm::Radix4};
let fft = Radix4::new(4096, FftDirection::Forward);
let mut buffer = vec![Complex{ re: 0.0f32, im: 0.0f32 }; 4096];
fft.process(&mut buffer);
For the vast majority of situations, simply using the FftPlanner
will be enough, but
advanced users may have better insight than the planner into which algorithms are best for a specific size. See the
algorithm
module for a complete list of scalar algorithms implemented by RustFFT.
Users should beware, however, that bypassing the planner will disable all AVX, SSE, Neon, and WASM SIMD optimizations.
§Feature Flags
-
avx
(Enabled by default)On x86_64, the
avx
feature enables compilation of AVX-accelerated code. Enabling it greatly improves performance if the client CPU supports AVX and FMA, while disabling it reduces compile time and binary size.On every platform besides x86_64, this feature does nothing, and RustFFT will behave like it’s not set.
-
sse
(Enabled by default)On x86_64, the
sse
feature enables compilation of SSE4.1-accelerated code. Enabling it improves performance if the client CPU supports SSE4.1, while disabling it reduces compile time and binary size. If AVX is also supported and its feature flag is enabled, RustFFT will use AVX instead of SSE4.1.On every platform besides x86_64, this feature does nothing, and RustFFT will behave like it’s not set.
-
neon
(Enabled by default)On AArch64 (64-bit ARM) the
neon
feature enables compilation of Neon-accelerated code. Enabling it improves performance, while disabling it reduces compile time and binary size.On every platform besides AArch64, this feature does nothing, and RustFFT will behave like it’s not set.
-
wasm_simd
(Disabled by default)On the WASM platform, this feature enables compilation of WASM SIMD accelerated code.
To execute binaries compiled with
wasm_simd
, you need a target browser or runtime which supportsfixed-width SIMD
. If you run your SIMD accelerated code on an unsupported platform, WebAssembly will specify a trap leading to immediate execution cancelation.On every platform besides WASM, this feature does nothing and RustFFT will behave like it is not set.
§Normalization
RustFFT does not normalize outputs. Callers must manually normalize the results by scaling each element by
1/len().sqrt()
. Multiple normalization steps can be merged into one via pairwise multiplication, so when
doing a forward FFT followed by an inverse callers can normalize once by scaling each element by 1/len()
§Output Order
Elements in the output are ordered by ascending frequency, with the first element corresponding to frequency 0.
§AVX Performance Tips
In any FFT computation, the time required to compute a FFT of size N relies heavily on the prime factorization of N. If N’s prime factors are all very small, computing a FFT of size N will be fast, and it’ll be slow if N has large prime factors, or if N is a prime number.
In most FFT libraries (Including RustFFT when using non-AVX code), power-of-two FFT sizes are the fastest, and users see a steep falloff in performance when using non-power-of-two sizes. Thankfully, RustFFT using AVX acceleration is not quite as restrictive:
- Any FFT whose size is of the form
2^n * 3^m
can be considered the “fastest” in RustFFT. - Any FFT whose prime factors are all 11 or smaller will also be very fast, but the fewer the factors of 2 and 3 the slower it will be.
For example, computing a FFT of size 13552
(2^4*7*11*11)
is takes 12% longer to compute than 13824(2^9 * 3^3)
, and computing a FFT of size 2541(3*7*11*11)
takes 65% longer to compute than 2592(2^5 * 3^4)
- Any other FFT size will be noticeably slower. A considerable amount of effort has been put into making these FFT sizes as fast as
they can be, but some FFT sizes just take more work than others. For example, computing a FFT of size 5183
(71 * 73)
takes about 5x longer than computing a FFT of size 5184(2^6 * 3^4)
.
In most cases, even prime-sized FFTs will be fast enough for your application. In the example of 5183 above, even that “slow” FFT only takes a few tens of microseconds to compute.
Some applications of the FFT allow for choosing an arbitrary FFT size (In many applications the size is pre-determined by whatever you’re computing). If your application supports choosing your own size, our advice is still to start by trying the size that’s most convenient to your application. If that’s too slow, see if you can find a nearby size whose prime factors are all 11 or smaller, and you can expect a 2x-5x speedup. If that’s still too slow, find a nearby size whose prime factors are all 2 or 3, and you can expect a 1.1x-1.5x speedup.
Re-exports§
pub use num_complex;
pub use num_traits;
Modules§
- Individual FFT algorithms
Structs§
- The FFT planner creates new FFT algorithm instances.
- The AVX FFT planner creates new FFT algorithm instances which take advantage of the AVX instruction set.
- The Neon FFT planner creates new FFT algorithm instances using a mix of scalar and Neon accelerated algorithms. It is supported when using the 64-bit AArch64 instruction set.
- The Scalar FFT planner creates new FFT algorithm instances using non-SIMD algorithms.
- The SSE FFT planner creates new FFT algorithm instances using a mix of scalar and SSE accelerated algorithms. It requires at least SSE4.1, which is available on all reasonably recent x86_64 cpus.
- The WASM FFT planner creates new FFT algorithm instances using a mix of scalar and WASM SIMD accelerated algorithms. It is supported when using fairly recent browser versions as outlined in the WebAssembly roadmap.
Enums§
- Represents a FFT direction, IE a forward FFT or an inverse FFT
Traits§
- A trait that allows FFT algorithms to report whether they compute forward FFTs or inverse FFTs
- Trait for algorithms that compute FFTs.
- Generic floating point number, implemented for f32 and f64
- A trait that allows FFT algorithms to report their expected input/output size